Multimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks

نویسندگان

  • Vladimir Pavlovic
  • Ashutosh Garg
  • James M. Rehg
چکیده

Inferring users’ actions and intentions forms an integral part of design and development of any human-computer interface. The presence of noisy and at times ambiguous sensory data makes this problem challenging. We formulate a framework for temporal fusion of multiple sensors using input–output dynamic Bayesian networks (IODBNs). We find that contextual information about the state of the computer interface, used as an input to the DBN, and sensor distributions learned from data are crucial for good detection performance. Nevertheless, classical DBN learning methods can cause such models to fail when the data exhibits complex behavior. To further improve the detection rate we formulate an errorfeedback learning strategy for DBNs. We apply this framework to the problem of audio/visual speaker detection in an interactive kiosk application using ”offthe-shelf” visual and audio sensors (face, skin, texture, mouth motion, and silence detectors). Detection results obtained in this setup demonstrate numerous benefits of our learning-based framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Floor holder detection and end of speaker turn prediction in meetings

We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the s...

متن کامل

Boosted Learning in Dynamic Bayesian Networks for Multimodal Speaker Detection

Bayesian network models provide an attractive framework for multimodal sensor fusion. They combine an intuitive graphical representation with efficient algorithms for inference and learning. However, the unsupervised nature of standard parameter learning algorithms for Bayesian networks can lead to poor performance in classification tasks. We have developed a supervised learning framework for B...

متن کامل

Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks

Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...

متن کامل

Multimodal Speaker Detection Using Error Feedback Dynamic Bayesian Networks

Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...

متن کامل

Control of Dynamic Systems Using Bayesian Networks

Bayesian networks for the static as well as for the dynamic case have gained an enormous interest in the research community of artificial intelligence, machine learning and pattern recognition. Although the parallels between dynamic Bayesian networks and Kalman filters are well known since many years, Bayesian networks have not been applied to problems in the area of adaptive control of dynamic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000